High Level Multimodal Fusion and Semantics Extraction
نویسندگان
چکیده
This thesis on multimodal fusion and semantics extraction, focuses on automated detection and annotation of harmful content in video data. The aim is not only to reason out the existence of violence or not (i.e. the binary problem), but also to determine the type of violence (e.g. fight, explosion, murder). Acknowledging the lack of knowledge representation and reasoning approaches for the problem at hand, we propose a semantic fusion approach that combines low to mid level modality specific semantics through ontological and rule reasoning. A major part of the proposed framework is the movie segmentation into meaningful and easy to handle units. We evaluate a set of shot boundary detection approaches combined through a majority voting scheme. In the sequel, state of the art classification methods are employed to extract audio and visual mid level semantics. The segmentation and modality specific analysis algorithms instantiate the corresponding video structure and modality specific ontologies developed in the context of the knowledge engineering framework. A set of consecutive and interleaved ontological and SWRL rule reasoning steps map sets and sequences of extracted low to mid level semantics into higher level concepts represented in the harmful content domain ontology. We present the involved ontologies, the corresponding SWRL rule sets and the reasoning mechanism in detail. Finally we present the evaluation of the proposed approach in a preanotated movie dataset, compare its results with the single modality approaches and a kNN late fusion meta classifier. We comment on the higher level semantics extraction ability and evaluate a set of extensions employed in the basic structure of the framework. The extensions concern the development of a scene detection module that combines markov clustering with SQWRL queries, the incorporation of existing rating and movie genre metadata in the violence identification procedure and the detection of pornography.
منابع مشابه
Ontology-based multimodal high level fusion involving natural language analysis for aged people home care application
This paper presents a knowledge-based method of early-stage high level multimodal fusion of data obtained from speech input and visual scene. The ultimate goal is to develop a human-computer multimodal interface to assist elderly people living alone at home to perform their daily activities, and to support their active ageing and social cohesion. Crucial for multimodal high level fusion and suc...
متن کاملAuthentication Using Multimodal Biometric Features
Multimodal biometric systems is the consolidated multiple biometric sources, which enable the recognition performance better than the single biometric modality systems. The information fusion in a multimodal system can be performed at various levels like data level fusion, feature level fusion, match score level fusion and decision level fusion. In this paper, we have studied the performance of...
متن کاملFeature Level Fusion in Biometric Systems
Multimodal biometric systems utilize the evidence presented by multiple biometric sources (e.g., face and fingerprint, multiple fingers of a user, multiple impressions of a single finger, etc.) in order to determine or verify the identity of an individual. Information from multiple sources can be consolidated in three distinct levels [1]: (i) feature extraction level; (ii) match score level; an...
متن کاملOptimal Face-Iris Multimodal Fusion Scheme
Multimodal biometric systems are considered a way to minimize the limitations raised by single traits. This paper proposes new schemes based on score level, feature level and decision level fusion to efficiently fuse face and iris modalities. Log-Gabor transformation is applied as the feature extraction method on face and iris modalities. At each level of fusion, different schemes are proposed ...
متن کاملZhejiang University at TRECVID 2006
We participated in the high-level feature extraction and interactive-search task for TRECVID 2006. Interaction and integration of multi-modality media types such as visual, audio and textual data in video are the essence of video content analysis. Although any uni-modality type partially expresses limited semantics less or more, video semantics are fully manifested only by interaction and integ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011